fix: add thread safety to SquidFilterWheel position tracking#520
fix: add thread safety to SquidFilterWheel position tracking#520hongquanli wants to merge 7 commits intomasterfrom
Conversation
The GUI thread and acquisition worker thread both call set_filter_wheel_position without synchronization. When they race, both read the same stale position, compute wrong relative deltas, and the physical wheel ends up at an unexpected position. Once off by even one slot, every subsequent move is wrong until a manual re-home. Add a threading.Lock that serializes all position-affecting operations (move, home, step, read) so the read → move → wait → update sequence is atomic. Split _home_wheel into locked/unlocked variants to avoid deadlock when called from _move_to_position's timeout recovery path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Refactor _move_to_position into locked wrapper + _move_to_position_unlocked (same pattern as _home_wheel), so _step_position can do bounds-check and move under a single lock hold instead of lock-release-relock. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dead move_w - set_filter_wheel_position now does validate + move under a single lock hold via _move_to_position_unlocked, consistent with all other public entry points. - Remove move_w() — grep confirms zero callers. It bypassed the lock and did not update _positions, so any caller would have desynced tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds thread-safe serialization around SQUID filter wheel position tracking to prevent concurrent GUI/worker operations from corrupting _positions and issuing incorrect relative moves.
Changes:
- Introduces a
threading.Lockto make read→move→wait→update sequences atomic across move/home/step/read operations. - Splits move/home helpers into locked vs unlocked variants to avoid deadlocks during timeout recovery.
- Adds concurrent-access tests to validate serialization and prevent position drift.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| software/squid/filter_wheel_controller/cephla.py | Adds a lock and refactors move/home methods to serialize operations and keep position tracking consistent under concurrency. |
| software/tests/squid/test_filter_wheel.py | Adds new multithreaded tests intended to reproduce and prevent race-induced position drift. |
| software/control/ndviewer_light | Updates submodule pointer (unrelated to filter wheel locking logic). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| step_size = 1.0 / 8 | ||
| expected_total_delta = (final_pos - 1) * step_size | ||
| actual_total_delta = sum(move_deltas) | ||
| assert abs(actual_total_delta - expected_total_delta) < 1e-9, ( | ||
| f"Delta mismatch: moves summed to {actual_total_delta}, " | ||
| f"but position {final_pos} requires {expected_total_delta}" | ||
| ) |
There was a problem hiding this comment.
This test’s expected delta calculation doesn’t match the production logic: SquidFilterWheel computes delta using SCREW_PITCH_W_MM / slot_count, not 1.0/8. As written, this will only pass if the pitch happens to be 1.0 mm and slot_count is exactly 8; otherwise it will fail or become misleading. Compute step_size the same way as the controller (using the configured min/max indices and the same pitch constant used in _move_to_position_unlocked).
There was a problem hiding this comment.
[Claude Code] Fixed in 6e13063 - now uses SCREW_PITCH_W_MM / (config.max_index - config.min_index + 1) matching production logic.
| t1.join(timeout=5) | ||
| t2.join(timeout=5) | ||
|
|
||
| assert not errors, f"Threads raised: {errors}" |
There was a problem hiding this comment.
The test can pass even if a thread deadlocks/hangs, because join(timeout=...) is not followed by an assertion that threads actually finished. Add assertions like assert not t1.is_alive() / assert not t2.is_alive() (or join without a timeout) so the test fails deterministically on lock regressions.
There was a problem hiding this comment.
[Claude Code] Fixed in 6e13063 - added assert not t1.is_alive() / assert not t2.is_alive() after both join calls in both tests.
| t1.join(timeout=10) | ||
| t2.join(timeout=10) | ||
|
|
||
| # The operations must not interleave — home and move regions shouldn't overlap |
There was a problem hiding this comment.
This assertion doesn’t actually verify the stated requirement (“operations must not interleave”). It only checks that the final tracked position is one of two values, which could still be true even if home_* and move_* calls overlap. Since you already collect call_order, assert that the sequence is non-overlapping (e.g., home_start, home_end, move_start, move_end or the reverse) rather than only asserting the final position.
| # The operations must not interleave — home and move regions shouldn't overlap | |
| # The operations must not interleave — home and move regions shouldn't overlap | |
| # Valid non-overlapping orders: | |
| # home_start, home_end, move_start, move_end | |
| # move_start, move_end, home_start, home_end | |
| assert call_order in ( | |
| ["home_start", "home_end", "move_start", "move_end"], | |
| ["move_start", "move_end", "home_start", "home_end"], | |
| ) | |
| # Final position should be the result of either the home or the move |
There was a problem hiding this comment.
[Claude Code] Fixed in 6e13063 - now asserts call_order matches one of two valid non-interleaving sequences. Note: the expected sequences are 6 elements (not 4) because _home_wheel_unlocked also calls _move_wheel for the offset move, producing an additional move_start/move_end pair.
| @@ -311,8 +328,3 @@ def get_delay_ms(self) -> Optional[float]: | |||
| def close(self): | |||
| """Close the filter wheel controller (no-op for SQUID).""" | |||
| pass | |||
There was a problem hiding this comment.
This PR removes the move_w(self, delta: float) backward-compatibility method at the end of the class. That’s an API-breaking change for any existing callers relying on it. If this is still needed externally, consider keeping it as a thin wrapper (possibly marked deprecated) or provide a migration path in the same PR.
| pass | |
| pass | |
| def move_w(self, delta: float, wheel_id: int = 1): | |
| """Deprecated backward-compatibility wrapper for moving the wheel. | |
| This method exists for API compatibility with older code. | |
| Prefer using :meth:`next_position`, :meth:`previous_position`, | |
| or :meth:`set_filter_wheel_position` instead. | |
| Args: | |
| delta: Number of steps to move. Positive values move forward, | |
| negative values move backward. Any fractional component is | |
| discarded. | |
| wheel_id: The wheel to move. Defaults to 1 for backward | |
| compatibility with older single-wheel usage. | |
| """ | |
| # Log a deprecation warning to aid migration off this method. | |
| _log.warning( | |
| "SquidFilterWheel.move_w() is deprecated; use next_position(), " | |
| "previous_position(), or set_filter_wheel_position() instead." | |
| ) | |
| # No movement requested. | |
| steps = int(delta) | |
| if steps == 0: | |
| return | |
| direction = 1 if steps > 0 else -1 | |
| for _ in range(abs(steps)): | |
| self._step_position(wheel_id, direction) |
There was a problem hiding this comment.
[Claude Code] Skipped - grep confirms zero callers of move_w() in the entire codebase. The suggested replacement also changes semantics (interprets delta as step count rather than mm distance). Adding a deprecated wrapper for dead code adds maintenance burden with no benefit.
- Use SCREW_PITCH_W_MM and config bounds for step_size calculation instead of hardcoded 1.0/8 (comment #1) - Add assert not t.is_alive() after join(timeout=...) to fail deterministically on deadlock (comment #2) - Assert call_order sequence is non-interleaving, accounting for the offset move within _home_wheel_unlocked (comment #3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: _home_wheel_unlocked now resets _positions to min_index on exception so subsequent moves don't compute deltas from a stale value. _move_to_position_unlocked wraps the re-home call to log re-home failures distinctly from the original timeout, and the retry failure message now warns that tracked position may not reflect physical state. Tests: add 20ms simulated MCU delay via wait_till_operation_is_completed side_effect so threads actually have a window to interleave — without the lock these tests would now fail. Previously MagicMock completed instantly, making the tests pass regardless of locking. Cleanup: - Remove dead _move_to_position() and _home_wheel() locked wrappers (zero callers after refactor — all entry points use _unlocked directly) - Add "Caller must hold self._lock" to _move_wheel docstring - Add "_configs is immutable after __init__" note in _step_position - Fix "(smaller)" wording in test docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The locked wrappers are gone, so no name collision. Simpler names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
set_filter_wheel_positionwithout synchronization on_positions. When they race, both read the same stale position, compute wrong relative deltas, and the physical wheel ends up at an unexpected position. Once off by even one slot, every subsequent move is wrong until a manual re-home.threading.LocktoSquidFilterWheelthat serializes all position-affecting operations (move, home, step, read) so the read-position → send-move → wait → update-position sequence is atomic._home_wheelinto locked/unlocked variants to avoid deadlock when called from_move_to_position's timeout recovery path.Test plan
test_concurrent_moves_serialize— two threads race to set different positions; verifies deltas sum correctly (no position drift)test_home_during_move_serializes— home and move race; verifies final position is valid (no interleaving)🤖 Generated with Claude Code